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Abstract 

Classical  game  theory  treats  players  as 
special — a  description  of  a  game  contains  a 
full,  explicit  enumeration  of  all  players — even 
though  in  the  real  world,  “players”  are  no  more 
fundamentally  special  than  rocks  or  clouds.  It 
isn’t  trivial  to  find  a  decision-theoretic  founda¬ 
tion  for  game  theory  in  which  an  agent’s  coplay¬ 
ers  are  a  non-distinguished  part  of  the  agent’s 
environment.  Attempts  to  model  both  players 
and  the  environment  as  Turing  machines,  for 
example,  fail  for  standard  diagonalization  rea¬ 
sons. 

In  this  paper,  we  introduce  a  “reflective”  type 
of  oracle,  which  is  able  to  answer  questions 
about  the  outputs  of  oracle  machines  with  ac¬ 
cess  to  the  same  oracle.  These  oracles  avoid 
diagonalization  by  answering  some  queries  ran¬ 
domly.  We  show  that  machines  with  access  to 
a  reflective  oracle  can  be  used  to  define  ratio¬ 
nal  agents  using  causal  decision  theory.  These 
agents  model  their  environment  as  a  probabilis¬ 
tic  oracle  machine,  which  may  contain  other 
agents  as  a  non-distinguished  part. 

We  show  that  if  such  agents  interact,  they  will 
play  a  Nash  equilibrium,  with  the  randomiza¬ 
tion  in  mixed  strategies  coming  from  the  ran¬ 
domization  in  the  oracle’s  answers.  This  can 
be  seen  as  providing  a  foundation  for  classical 
game  theory  in  which  players  aren’t  special. 


1  Introduction 

Classical  decision  theory  and  game  theory  are  founded 
on  the  notion  of  a  perfect  Bayesian  reasoner  [2].  Such 
an  agent  may  be  uncertain  which  of  several  possible 
worlds  describes  the  state  of  its  environment,  but  given 
any  particular  possible  world,  it  is  able  to  deduce  ex¬ 
actly  what  outcome  each  of  its  available  actions  will 
produce  [3].  This  assumption  is,  of  course,  unrealis¬ 
tic  [4,  ^]:  Agents  in  the  real  world  must  necessarily 
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be  boundedly  rational  reasoners,  which  make  decisions 
with  finite  computational  resources.  Nevertheless,  the 
notion  of  a  perfect  Bayesian  reasoner  provides  an  ana¬ 
lytically  tractable  first  approximation  to  the  behavior 
of  real-world  agents,  and  underlies  an  enormous  body 
of  work  in  statistics  [6],  economics  [7],  computer  sci¬ 
ence  [8],  and  other  fields. 

On  closer  examination,  however,  the  assumption 
that  agents  can  compute  what  outcome  each  of  their 
actions  leads  to  in  every  possible  world  is  troublesome 
even  if  we  assume  that  agents  have  unbounded  comput¬ 
ing  power.  For  example,  consider  the  game  of  Matching 
Pennies,  in  which  two  players  each  choose  between  two 
actions  (“heads”  and  “tails”);  if  the  players  choose  the 
same  action,  the  first  player  wins  a  dollar,  if  they  choose 
differently,  the  second  player  wins.  Suppose  further 
that  both  players’  decision-making  processes  are  Tur¬ 
ing  machines  with  unlimited  computing  power.  Finally, 
suppose  that  both  players  know  the  exact  state  of  the 
universe  at  the  time  they  begin  deliberating  about  the 
actions  they  are  going  to  choose,  including  the  source 
code  of  their  opponent’s  decision-making  algorithm.^ 

In  this  set-up,  by  assumption,  both  agents  know  ex¬ 
actly  which  possible  world  they  are  in.  Suppose  that 
they  are  able  to  use  this  information  to  accurately 
predict  their  opponent’s  behavior.  Since  both  play¬ 
ers’  decision-making  processes  are  deterministic  Tur¬ 
ing  machines,  their  behavior  is  deterministic  given  the 
initial  state  of  the  world;  each  player  either  definitely 
plays  “heads”  or  definitely  plays  “tails” .  But  neither  of 
these  possibilities  is  consistent:  For  example,  if  the  first 
player  chooses  heads  and  the  second  player  can  predict 
this,  the  second  player  will  choose  tails,  but  if  the  first 
player  can  predict  this  in  turn,  it  will  choose  tails,  con¬ 
tradicting  the  assumption  that  it  chooses  heads. 

The  problem  is  caused  by  the  assumption  that  given 
its  opponent’s  source  code,  a  player  can  figure  out  what 
action  the  opponent  will  choose.  One  might  think  that 
it  could  simply  run  its  opponent’s  source  code,  but  if  the 
opponent  does  the  same,  both  programs  will  go  into  an 
infinite  loop.  Binmore  [10],  discussing  the  philosophical 

^The  technique  of  quining  (Kleene’s  second  recursion  the¬ 
orem  [9])  shows  that  it  is  possible  to  write  two  programs  that 
have  access  to  each  other’s  source  code. 
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justification  for  game-theoretic  concepts  such  as  Nash 
equilibrium,  puts  this  problem  as  follows: 

In  any  case,  if  Turing  machines  are  used 
to  model  the  players,  it  is  possible  to  sup¬ 
pose  that  the  play  of  a  game  is  prefixed 
by  an  exchange  of  the  players’  Godel  num¬ 
bers.  . .  Within  this  framework,  a  perfectly 
rational  machine  ought  presumably  to  be 
able  to  predict  the  behavior  of  the  opposing 
machines  perfectly,  since  it  will  be  familiar 
with  every  detail  of  their  design.  And  a  uni¬ 
versal  Turing  machine  can  do  this.  What  it 
cannot  do  is  predict  its  opponents’  behavior 
perfectly  and  simultaneously  participate  in 
the  action  of  the  game.  It  is  in  this  sense 
that  the  claim  that  perfect  rationality  is  an 
unattainable  ideal  is  to  be  understood. 

Even  giving  the  players  access  to  a  halting  oracle  does 
not  help,  because  even  though  a  machine  with  access 
to  a  halting  oracle  can  predict  the  behavior  of  an  ordi¬ 
nary  Turing  machine,  it  cannot  in  general  predict  the 
behavior  of  another  oracle  machine. 

Classical  game  theory  resolves  this  problem  by  al¬ 
lowing  players  to  choose  mixed  strategies  (probabil¬ 
ity  distributions  over  actions);  for  example,  the  unique 
Nash  equilibrium  of  Matching  Pennies  is  for  each  player 
to  assign  “heads”  and  “tails”  probability  0.5  each.  How¬ 
ever,  instead  of  treating  players’  decision-making  algo¬ 
rithms  as  computable  processes  which  are  an  ordinary 
part  of  a  world  with  computable  laws  of  physics,  clas¬ 
sical  game  theory  treats  players  as  special  objects.  For 
example,  to  describe  a  problem  in  game-theoretic  terms, 
we  must  provide  an  explicit  list  of  all  relevant  players, 
even  though  in  the  real  world,  “players”  are  ordinary 
physical  objects,  not  fundamentally  distinct  from  ob¬ 
jects  such  as  rocks  or  clouds. 

In  this  paper,  we  show  that  it  is  possible  to  define 
a  certain  kind  of  probabilistic  oracle — that  is,  an  ora¬ 
cle  which  answers  some  queries  non-deterministically — 
such  that  a  Turing  machine  with  access  to  this  oracle 
can  perform  perfect  Bayesian  reasoning  about  environ¬ 
ments  that  can  themselves  be  described  as  oracle  ma¬ 
chines  with  access  to  the  same  oracle.  This  makes  it 
possible  for  players  to  treat  opponents  simply  as  an  or¬ 
dinary  part  of  this  environment. 

When  an  environment  contains  multiple  agents 
playing  a  game  against  each  other,  the  probabilistic  be¬ 
havior  of  the  oracle  may  cause  the  players’  behavior  to 
be  probabilistic  as  well.  We  show  that  in  this  case,  the 
players  will  always  play  a  Nash  equilibrium,  and  for  ev¬ 
ery  particular  Nash  equilibrium  there  is  an  oracle  that 
causes  the  players  to  behave  according  to  this  equilib¬ 
rium.  In  this  sense,  our  work  can  be  seen  as  providing 
a  foundation  for  classical  game  theory,  demonstrating 
that  the  special  treatment  of  players  in  the  classical 
theory  is  not  fundamental. 

The  oracles  we  consider  are  not  halting  oracles;  in¬ 
stead,  roughly  speaking,  they  allow  oracle  machines 


with  access  to  such  an  oracle  to  determine  the  prob¬ 
ability  distribution  of  outputs  of  other  machines  with 
access  to  the  same  oracle.  Because  of  their  ability  to 
deal  with  self-reference,  we  refer  to  these  oracles  as  re¬ 
flective  oracles. 

2  Reflective  Oracles 

In  many  situations,  programs  would  like  to  predict 
the  output  of  other  programs.  They  could  simulate 
the  other  program  in  order  to  do  this.  However,  this 
method  fails  when  there  are  cycles  (e.g.  program  A  is 
concerned  with  the  output  of  program  B  which  is  con¬ 
cerned  with  the  output  of  program  A).  Furthermore, 
if  a  procedure  to  determine  the  output  of  another  pro¬ 
gram  existed,  then  it  would  be  possible  to  construct  a 
liar’s  paradox  of  the  form  “if  I  return  1,  then  return  0, 
otherwise  return  I” . 

These  paradoxes  can  be  resolved  by  using  probabil¬ 
ities.  Let  M  be  the  set  of  probabilistic  oracle  machines, 
defined  here  as  Turing  machines  which  can  execute  spe¬ 
cial  instructions  to  (i)  flip  a  coin  that  has  an  arbitrary 
rational  probability  of  coming  up  heads,  and  to  (ii)  call 
an  oracle  O,  whose  behavior  might  itself  be  probabilis¬ 
tic. 

Roughly  speaking,  the  oracle  answers  questions  of 
the  form:  “Is  the  probability  that  machine  M  returns  I 
greater  than  pi”  Thus,  O  takes  two  inputs,  a  machine 
M  G  M  and  a  rational  probability  p  G  [0, 1]  fl  Q,  and 
returns  either  0  or  1.  If  M  is  guaranteed  to  halt  and 
to  output  either  0  or  I  itself,  we  want  0{M,p)  =  I  to 
mean  that  the  probability  that  M  returns  1  (when  run 
with  O)  is  at  least  p,  and  0{M,p)  =  0  to  mean  that  it 
is  at  most  p;  if  it  is  equal  to  p,  both  conditions  are  true, 
and  the  oracle  may  answer  randomly.  In  summary, 

P(mO()  =  1)  >  p  ^  V{0{M,p)  =  l)  =  I 

P(M°()  =  1)  <  p  ^  P(C>(M,p)=0)  =  I 

where  we  write  P(M‘^()  =  I)  for  the  probability  that  M 
returns  1  when  run  with  oracle  O,  and  ¥{0{M,p)  =  I) 
for  the  probability  that  the  oracle  returns  1  on  in¬ 
put  {M,p).  We  assume  that  different  calls  to  the  or¬ 
acle  are  stochastically  independent  events  (even  if  they 
are  about  the  same  pair  {M,p));  hence,  the  behavior 
of  an  oracle  O  is  fully  specified  by  the  probabilities 
P(0(M,p)  =  I). 

Definition  A  query  (with  respect  to  a  particular  ora¬ 
cle  O)  is  a  pair  {M,p),  where  p  G  [0 , 1]  n  Q  and  M^Q  is 
a  probabilistic  oracle  machine  which  almost  surely  halts 
and  returns  an  element  of  {0, 1}. 

Definition  An  oracle  is  called  reflective  on  R,  where  R 
is  a  set  of  queries,  if  it  satisfies  the  two  conditions  dis¬ 
played  above  for  every  (M,  p)  G  R.  It  is  called  reflective 
if  it  is  reflective  on  the  set  of  all  queries. 
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Theorem  2.1.  (i)  There  is  a  reflective  oracle. 

(a)  For  any  oracle  O  and  every  set  of  queries  R, 
there  is  an  oracle  O'  which  is  reflective  on  R  and 
satisfies  F{0'{M,p)  =  1)  =  F{0{M,p)  =  1)  for  all 
iM,p)  i  R. 

Proof.  For  the  proof  of  (ii),  see  Appendix  B;  see  also 
Theorem  5.1,  which  gives  a  more  elementary  proof  of 
a  special  case.  Part  (i)  follows  from  part  (ii)  by  choos¬ 
ing  R  to  be  the  set  of  all  queries  and  letting  O  be  arbi¬ 
trary.  □ 

As  an  example,  consider  the  machine  given  by 
M^if)  =  1  —  0(M, 0.5),  which  implements  a  version 
of  the  liar  paradox  by  asking  the  oracle  what  it  will 
return  and  then  returning  the  opposite.  By  the  exis¬ 
tence  theorem,  there  is  an  oracle  which  is  reflective  on 
R  =  {(M, 0.5)}.  This  is  no  contradiction:  We  can  set 
P(O(M,0.5)  =  1)  =  P(O(M,0.5)  =  0)  =  0.5,  leading 
the  program  to  output  1  half  the  time  and  0  the  other 
half  of  the  time. 

3  Prom  Reflective  Oracles  to  Causal 
Decision  Theory 

We  now  show  how  reflective  oracles  can  be  used  to  im¬ 
plement  a  perfect  Bayesian  reasoner.  We  assume  that 
each  possible  environment  that  this  agent  might  find 
itself  in  can  likewise  be  modeled  as  an  oracle  machine; 
that  is,  we  assume  that  the  laws  of  physics  are  com¬ 
putable  by  a  probabilistic  Turing  machine  with  access 
to  the  same  reflective  oracle  as  the  agent.  For  exam¬ 
ple,  we  might  imagine  our  agent  as  being  embedded 
in  a  Turing-complete  probabilistic  cellular  automaton, 
whose  laws  are  specified  in  terms  of  the  oracle. 

We  assume  that  each  of  the  agent’s  hypothe¬ 
ses  about  which  environments  it  finds  itself  in  can 
be  modeled  by  a  (possibly  probabilistic)  “world  pro¬ 
gram”  H^{)^  which  simulates  this  environment  and  re¬ 
turns  a  description  of  what  happened.  We  can  then 
define  a  machine  W'^Q  which  samples  a  hypothesis  H 
according  to  the  agent’s  probability  distribution  and 
runs  H^Q.  In  the  sequel,  we  will  talk  about  W^{)  as  if 
it  refers  to  a  particular  environment,  but  this  machine 
is  assumed  to  incorporate  subjective  uncertainty  about 
the  laws  of  physics  and  the  initial  state  of  the  world. 

We  further  assume  that  the  agent’s  decision-making 
process,  A'^Q,  can  be  modeled  as  a  probabilistic  oracle 
machine  embedded  in  this  environment.  As  a  simple 
example,  consider  the  world  program 

WO()=l™  =  0 

\$15  otherwise 

In  this  world,  the  outcome  is  $20  (which  in  this  case 
means  the  agent  receives  $20)  if  the  agent  chooses  ac¬ 
tion  0  and  $15  if  the  agent  chooses  action  1. 

Our  task  is  to  find  an  appropriate  implementation 
of  A®().  Here,  we  consider  agents  implementing  causal 


decision  theory  (CDT)  [Id],  which  evaluates  actions  ac¬ 
cording  to  the  consequences  they  cause:  For  example, 
if  the  agent  is  a  robot  embedded  in  a  cellular  automa¬ 
ton,  it  might  evaluate  the  expected  utility  of  taking 
action  0  or  1  by  simulating  what  would  happen  in  the 
environment  if  the  output  signal  of  its  decision-making 
component  were  replaced  by  either  0  or  1. 

We  will  assume  that  the  agent’s  model  of  the  coun- 
terfactual  consequences  of  taking  different  actions  a  is 
described  by  a  machine  W^(a),  satisfying  W'^Q  = 
W^{A'^{))  since  in  the  real  world,  the  agent  takes  ac¬ 
tion  a  =  A'^l).  In  our  example, 

_/$20  ifa  =  0 
I  $15  otherwise 


We  assume  that  the  agent  has  a  utility  function  over 
outcomes,  u{-),  implemented  as  a  lookup  table,  which 
takes  rational  values  in  [0,  l].f_  Furthermore,  we  assume 
that  both  and  W^{1)  halt  almost  surely  and 

return  a  value  in  the  domain  of  u{-).  Causal  decision 
theory  then  prescribes  choosing  the  action  that  maxi¬ 
mizes  expected  utility;  in  other  words,  we  want  to  find 
an  such  that 

A^Q  =  argmax  E  [w  (VF® (a))] 

a 

In  the  case  of  ties,  any  action  maximizing  utility  is  al¬ 
lowed,  and  it  is  acceptable  for  A^{)  to  randomize. 

We  cannot  compute  this  expectation  by  simply  run¬ 
ning  u{W^ (a))  many  times  to  obtain  samples,  since  the 
environment  might  contain  other  agents  of  the  same 
type,  potentially  leading  to  infinite  loops.  However,  we 
can  find  an  optimal  action  by  making  use  of  a  reflec¬ 
tive  oracle.  This  is  easiest  when  the  agent  has  only  two 
actions  (0  and  1),  but  similar  analysis  extends  to  any 
number  of  actions.  Define  a  machine 

u(WO(l)) -u(WO (0))  +  l 
2 


:=  flip 


where  flip(p)  is  a  probabilistic  function  that  returns  1 
with  probability  p  and  0  with  probability  1  —  p. 

Theorem  3.1.  O  is  reflective  on  {{E,  1/2)}  if  and  only 
if  A'^O  :=  0{E, 1/2)  returns  a  utility-maximizing  ac¬ 
tion. 

Proof.  The  demand  that  A^{) 
maxmizing  action  is  equivalent  to 

E[u«(l))]>EKWO(0))] 
EK<(1))]<EKWO(0))] 


return  a  utility- 

A^O  =  1 
A°()  =  0 


We  have 

u[wO{i))-u{wom  +  i 

2 

^Since  the  meaning  of  utility  functions  is  invariant  under 
affine  transformations,  the  choice  of  the  particular  interval 
[0, 1]  is  no  restriction. 


F{E^{)  =  1)  =E 
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It  is  not  difficult  to  check  that  E[m(W^(1))]  ^ 

E[w(Vk®(0))]  iff  P(£’'^()  =  1)  ^  1/2.  Together  with 
the  definition  of  A^Q,  we  can  use  this  to  rewrite  the 
above  conditions  as 

V{E^{)  =  1)  >  1/2  ^  0{E,l/2)  =  1 
¥{E^0  =  1)  <  1/2  ^  0{E,l/2)  =  0 

But  this  is  precisely  the  definition  of  “O  is  reflective  on 

{{E,i/2)y\  □ 

In  order  to  handle  agents  which  can  choose  between 
more  than  two  actions,  we  can  compare  action  0  to 
action  1,  then  compare  action  2  to  the  best  of  actions 
0  and  1,  then  compare  action  3  to  the  best  of  the  first 
three  actions,  and  so  on.  Adding  more  actions  in  this 
fashion  does  not  substantially  change  the  analysis. 

4  Prom  Causal  Decision  Theory  to 
Nash  Equilibria 

Since  we  have  taken  care  to  define  our  agents’  world 
models  (a)  in  such  a  way  that  they  can  em¬ 
bed  other  agents,^  we  need  not  do  anything  spe¬ 
cial  to  pass  from  single-agent  to  multi-agent  settings. 
As  in  the  single-agent  case,  we  model  the  environ¬ 
ment  by  a  program  W^{)  that  contains  embedded 
agent  programs  Af,...,Ay  and  returns  an  outcome. 
We  can  make  the  dependency  on  the  agent  pro¬ 
gram  explicit  by  writing  Q, . . . ,  A^()) 

for  some  oracle  machine  This  allows  us 

to  define  machines  Wy(ai)  :=  {ui,  A^^Q)  := 

F(Af  (),...,  Ap_i(),  Oi,  i(), . . . ,  A^Q),  representing 
the  causal  effects  of  player  i  taking  action  at. 

We  assume  that  each  agent  has  a  utility  func¬ 
tion  Ui{-)  of  the  same  type  as  in  the  previous  subsection. 
Hence,  we  can  define  the  agent  programs  AfQ  just  as 
before: 

Af()=0(A„l/2) 

^Oq  ^  flip  l^uywy(l))-uywy(0))  +  l^ 

Here,  each  E^Q  calls  WfQ,  which  calls  A®()  for  each 
j  7^  i,  which  refers  to  the  source  code  of  A^(),  but 
again,  Kleene’s  second  recursion  theorem  shows  that 
this  kind  of  self-reference  poses  no  theoretical  prob¬ 
lem  [9]. 

This  setup  very  much  resembles  the  setting  of 
normal-form  games.  In  fact: 

®More  precisely,  we  have  only  required  that  IT4  (a)  al¬ 
ways  halt  and  produce  a  value  in  the  domain  of  the  utility 
function  w(-).  Since  all  our  agents  do  is  to  perform  a  single 
oracle  call,  they  always  halt,  making  them  safe  to  call  from 
WSia). 


Theorem  4.1.  Given  an  oracle  O,  consider  the 
n-player  normal-form  game  in  which  the  payoff  of 
player  i,  given  the  pure  strategy  profile  (ai,...,a„), 
is  E[ui(F‘^(ai, . . .  ,a„)h.  The  mixed  strategy  profile 
given  by  Si  :=  P(Ap()  =  1)  is  a  Nash  equilib¬ 

rium  of  this  game  if  and  only  if  O  is  reflective  on 
{{El, 1/2),...,  {E^, 1/2)}. 

Proof.  For  (si,...,s„)  to  be  a  Nash  equilibrium  is 
equivalent  to  every  player’s  mixed  strategy  being  a  best 
response;  i.e.,  a  pure  strategy  Ui  can  only  be  assigned 
positive  probability  if  it  maximizes 

E[u,(FO(a„AO  ()))]  =  EK(H/0(a,))] 

By  an  application  of  Theorem  3.1,  this  is  equivalent  to 
O  being  reflective  on  {(Ai,  1/2)}.  □ 

Note  that,  in  particular,  any  normal-form  game  with 
rational- valued  payoffs  can  be  represented  in  this  way 
by  simply  choosing  F^  to  be  the  identity  function.  In 
this  case,  the  theorem  shows  that  every  reflective  oracle 
(which  exists  by  Theorem  2.11  gives  rise  to  a  Nash  equi¬ 
librium.  In  the  other  direction.  Theorem  4.1  together 
with  Theorem  2.Hiil  show  that  for  any  Nash  equilib¬ 
rium  (si, . . . ,  Sji)  of  the  normal-form  game,  there  is  a 
reflective  oracle  such  that  P(Ap()  =  1)  =  Si. 

5  From  Nash  Equilibria  to  Reflective 
Oracles 

In  the  previous  section,  we  showed  that  a  reflective  or¬ 
acle  can  be  used  to  find  Nash  equilibria  in  arbitrary 
normal-form  games.  It  is  interesting  to  note  that  we 
can  also  go  in  the  other  direction:  For  finite  sets  R  satis¬ 
fying  certain  conditions,  we  can  construct  normal-form 
games  Gn  such  that  the  existence  of  oracles  reflective  on 
R  follows  from  the  existence  of  Nash  equilibria  in  Gr. 
This  existence  theorem  is  a  special  case  of  Theorem  2.1, 
but  it  not  only  provides  a  more  elementary  proof,  but 
also  provides  a  constructive  way  of  finding  such  oracles 
(by  applying  any  algorithm  for  finding  Nash  equilibria 
to  Gr). 

Definition  A  set  R  of  queries  is  closed  if  for  every 
{M,p)  e  R  and  every  oracle  O,  M^{)  is  guaranteed  to 
only  invoke  the  oracle  on  pairs  {N,  q)  G  R.  It  is  bounded 
if  there  is  some  bound  Br  €  N  such  that  for  every 
{M,p)  e  R  and  every  oracle  O,  M^{)  is  guaranteed  to 
invoke  the  oracle  at  most  Br  times. 

Definition  Given  a  finite  set 

R  =  {{Mi,pi),...,{M„,p„)}  and  a 

vector  X  S  [0, 1]",  define  Os  to  be  the  oracle  satisfying 
f‘{Os{Mi,pi)  =  1)  =  Xi  for  i  =  l,...,n,  and 

f‘{Os{M,p)  =  1)  =  0  for  {M,p)  i  R. 

Theorem  5.1.  For  any  finite,  closed,  bounded  set  R  = 
{{Mi,pi), . . . ,  {Mn,Pn)},  there  is  a  normal  form  game 
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Gr  with  m  :=  n  ■  (2Br  +  1)  players,  each  of  which  has 
two  pure  strategies,  such  that  for  any  Nash  equilibrium 
strategy  profile  {si, . . . ,  Sm),  the  oracle  Og  with  x  := 
(si, . . . ,  Sn)  is  reflective  on  R. 

Proof.  We  divide  the  n  •  {2Br  +  1)  players  in  our  game 
into  three  sets:  the  main  players  i  =  1, . . . ,  n,  the  copy 
players  g{i,j)  :=  j  ■  n  +  i,  and  the  auxiliary  players 
Hhj)  ■=  {BR+j)-n  +  i,  ior  i  =  I,. . .  ,n,  j  =  1,. .  .,Br. 

The  mixed  strategy  Si  of  a  main  player  i  will  de¬ 
termine  the  probability  that  Os{Mi,pi)  =  1.  We  will 
force  =  Si,  i.e.,  we  will  force  the  mixed  strategy 

of  each  copy  player  to  equal  that  of  the  correspond¬ 
ing  main  player;  thus,  the  copy  players  will  provide  us 
with  independent  samples  from  the  Bernoulli(si)  distri¬ 
bution,  allowing  us  to  simulate  up  to  Br  independent 
calls  to  0{Mi,pi).  Finally,  the  auxiliary  players  are 
used  to  enforce  the  constraint  Sg(ij)  =  Sj,  by  having 
the  copy  player  g{i,j)  play  a  variant  of  Matching  Pen¬ 
nies  against  the  auxiliary  player  h{i,j). 

In  order  to  define  the  game’s  payoff  function,  note 
first  that  by  writing  out  each  possible  way  that  the  at 
most  Br  oracle  calls  of  might  come  out,  we  can 

write  the  probability  that  this  machine  returns  1  as  a 
polynomial, 

K  n 

k—l  i'  —  l 

where  di^k,i'  <  Br.  We  want  to  force  the  main  player  i 
to  choose  pure  strategy  1  if  this  probability  is  strictly 
greater  than  pi,  pure  strategy  0  if  it  is  strictly  smaller. 

To  do  so,  we  set  player  i’s  payoff  function  Ui{d)  to 

J2k=ifhk{a),  if  a*  =  1, 

Pi,  otherwise 

where 

p  _  r  Cqfc  if  ^g{i'  ,j)  —  I  i  ^  ^  —  j  di^k,i' 

°  otherwise 

Then,  assuming  we  can  guarantee  Sg(i  g)  =  Si,  the 
expected  payoff  of  strategy  1  to  player  i  is  exactly 
=  1),  while  the  payoff  of  strategy  0  is  always 
Pi]  hence,  as  desired,  the  Nash  equilibrium  conditions 
force  i  to  choose  1  if  the  probability  is  greater  than  pi, 

0  if  it  is  smaller. 

It  remains  to  choose  the  payoffs 
{ug(^ij){a) ,  Uh{ij){S))  of  the  copy  and  auxiliary  players. 

In  order  to  force  Sg(ij)  =  Si,  we  set  these  payoffs  as 
follows: 


a*  =  0 

^h(i,j)  —  1 

Eo) 

(oTo) 

(Ml 

tti  =  1 

^h(i,j)  —  1 

(M) 

(M^ 

(M^ 

We  show  in  Appendix  A  that  at  Nash  equilibrium, 
these  payoffs  force  Sg(ij)  =  Sj.  □ 

Theorem  5.1  is  a  special  case  of  Theorem  2.1(i).  The 
proof  can  be  adapted  to  also  show  an  analog  of  Theo¬ 
rem  .2d^(ii),  but  we  omit  the  details  here. 

6  Related  Work 

Joyce  and  Gibbard  [12]  describe  one  justification  for 
mixed  Nash  equilibria  in  terms  of  causal  decision  theory. 
Specifically,  they  discuss  a  self -ratification  condition 
that  extends  CDT  to  cases  when  one’s  action  is  evidence 
of  different  underlying  conditions  that  might  change 
which  actions  are  rational.  An  action  self-ratifies  if  and 
only  if  it  causally  maximizes  expected  utility  in  a  world 
model  that  has  been  updated  on  the  evidence  that  this 
action  is  taken. 

For  example,  consider  the  setting  of  a  matching  pen¬ 
nies  game  where  players  can  predict  each  other  accu¬ 
rately.  The  fact  that  player  A  plays  “heads”  is  ev¬ 
idence  that  player  B  will  predict  that  player  A  will 
play  “heads”  and  play  “tails”  in  response,  so  player 
A  would  then  have  preferred  to  play  “tails”,  and  so 
the  “heads”  action  would  fail  to  self-ratify.  However, 
the  mixed  strategy  of  flipping  the  coin  would  self-ratify. 
Our  reflection  principle  encodes  some  global  constraints 
on  players’  mixed  strategies  that  are  similar  to  self- 
ratihcation. 

The  question  of  how  to  model  agents  as  an  ordinary 
part  of  the  environment  is  of  interest  in  the  speculative 
study  of  human-level  and  smarter-than-human  artifi¬ 
cial  intelligence  [13,  JTj.  Although  such  systems  are 
still  firmly  in  the  domain  of  futurism,  there  has  been  a 
recent  wave  of  interest  in  foundational  research  aimed 
at  understanding  their  behavior,  in  order  to  ensure  that 
they  will  behave  as  intended  if  and  when  they  are  de¬ 
veloped  [15, 16 , 11] . 

Theoretical  models  of  smarter-than-human  intel¬ 
ligence  such  as  Hutter’s  universally  intelligent  agent 
AIXI  [17]  typically  treat  the  agent  as  separate  from  the 
environment,  communicating  only  through  well-defined 
input  and  output  channels.  In  the  real  world,  agents 
run  on  hardware  that  is  part  of  the  environment,  and 
Orseau  and  Ring  [13]  have  proposed  formalisms  for 
studying  space-time  embedded  intelligence  running  on 
hardware  that  is  embedded  in  its  environment.  Our 
formalism  might  be  useful  for  studying  idealized  mod¬ 
els  of  agents  embedded  in  their  environment:  While 
real  agents  must  be  boundedly  rational,  the  ability  to 
study  perfectly  Bayesian  space-time  embedded  intelli¬ 
gence  might  help  to  clarify  which  aspects  of  realistic 
systems  are  due  to  bounded  rationality,  and  which  are 
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due  to  the  fact  that  real  agents  aren’t  cleanly  separated 
from  their  environment. 

7  Conclusions  and  Future  Work 

In  this  paper,  we  have  introduced  reflective  oracles,  a 
type  of  probabilistic  oracle  which  is  able  to  answer  ques¬ 
tions  about  the  behavior  of  oracle  machines  with  ac¬ 
cess  to  the  same  oracle.  We’ve  shown  that  such  oracle 
machines  can  implement  a  version  of  causal  decision 
theory,  and  used  this  to  establish  a  close  relationship 
between  reflective  oracles  and  Nash  equilibria. 

We  have  focused  on  answering  queries  about  oracle 
machines  that  halt  with  probability  1,  but  the  reflec¬ 
tion  principle  presented  in  Section  2  can  be  modified  to 
apply  to  machines  that  do  not  necessarily  halt.  To  do 
so,  we  replace  the  condition 

P(M°()  =  1)  <  p  ^  ¥{O{M,p)=0)  =  1 
by  the  condition 

P(M®()^0)  <  p  ^  ¥{O{M,p)=0)  =  1 

This  is  identical  to  the  former  principle  if  ()  is  guar¬ 
anteed  to  halt,  but  provides  sensible  information  even  if 
there  is  a  chance  that  M‘^()  loops.  Appendix  B  proves 
the  existence  of  reflective  oracles  satisfying  this  stronger 
reflection  principle. 

The  ability  to  deal  with  non-halting  machines  opens 
up  the  possibility  of  applying  reflective  oracles  to  sim¬ 
plicity  priors  such  as  Solomonoff  induction  [18].  which 
defines  a  probability  distribution  over  infinite  bit  se¬ 
quences  by,  roughly,  choosing  a  random  program  and 
running  it.  Solomonoff  induction  deals  with  com¬ 
putable  hypotheses,  but  is  itself  uncomputable  (albeit 
computably  approximable)  because  it  must  deal  with 
the  possibility  that  a  randomly  chosen  program  may  go 
into  an  infinite  loop  after  writing  only  a  finite  number 
of  bits  on  its  output  tape.  A  reflective  oracle  version  of 
Solomonoff  induction  would  be  able  to  deal  with  a  hy¬ 
pothesis  space  consisting  of  arbitrary  oracle  machines, 
while  itself  being  implementable  as  an  oracle  machine; 
this  would  make  it  possible  to  model  a  predictor  which 
predicts  an  environment  it  is  itself  embedded  in.  We 
leave  details  to  future  work. 
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APPENDIX 

A  Nash  Equilibria  in  a  Variant  of 
Matching  Pennies 

Lemma  A.l.  Consider  an  n-player  game  with  three 
distinguished  players,  each  of  which  has  two  pure  strate¬ 
gies:  Player  Row  has  strategies  Up  and  Down,  player 
Column  has  strategies  Left  and  Right,  and  player  Ma¬ 
trix  has  strategies  Front  and  Back.  Suppose  that  the 
payoffs  of  (Row,  Column)  depend  only  on  the  strategies 
of  these  three  players,  as  follows: 


where  the  first  matrix  indicates  the  payoffs  when  Matrix 
plays  Front,  and  the  second  matrix  indicates  the  payoffs 
when  Matrix  plays  Back. 

Write  p  for  the  probability  that  Row  plays  Down, 
and  q  for  the  probability  that  Matrix  plays  Back.  At 
Nash  equilibrium,  we  have  p  =  q. 

Proof.  •  Case  1:  0  <  g  <  1. 

Suppose  that  there  is  a  Nash  equilibrium  where 
Column  plays  Left.  Then  Row  would  play  Up,  but 
then  Column  would  strictly  prefer  Right,  which  is 
a  contradiction. 

Suppose  that  there  is  a  Nash  equilibrium  where 
Column  plays  Right.  Then  Row  would  play 
Down,  but  then  Column  would  strictly  prefer 
Left,  which  is  a  contradiction. 

Thus,  at  every  Nash  equilibrium,  Column  must 
mix  between  strategies.  Hence,  at  equilibrium. 
Column  must  be  indifferent  between  Left  and 
Right.  This  is  equivalent  to  p(l  —  g)  =  (1  —  p)q. 
This  implies  p  >  0,  since  otherwise  we’d  have 
0(1  —  g)  =  (1  —  0)g,  i.e.  0  =  g,  but  we  assumed 
0  <  g  <  1.  Thus,  we  can  divide  the  equation  by 
pq,  yielding: 

(1  -  q)/q  =  (1  -p)/p 
1/x  —  1  =  1/p  —  1 
1/g  =  1/p 
<Aq=p 

•  Case  2:  g  =  0. 

This  gives  us  the  following  payoff  matrix: 


(1,0) 

(0,0) 

TMT 

IW 

Suppose  that  there  is  a  Nash  equilibrium  with  p  > 
0.  Then  at  this  equilibrium.  Column  must  play 
Left;  but  if  Column  plays  Left,  then  Row  strictly 
prefers  Up,  which  contradicts  p  >  0.  Hence,  we 
must  have  p  =  0  =  g. 

•  Case  3:  g  =  1. 

This  gives  us  the  following  payoff  matrix: 


(1,0) 

(0,1) 

7^ 

TMT 

Suppose  that  there  is  a  Nash  equilibrium  with 
p  <  1.  Then  at  this  equilibrium.  Column  must 
play  Right;  but  if  Column  plays  Right,  then  Row 
strictly  prefers  Down,  which  contradicts  p  <  1. 
Hence,  we  must  have  p  =  1  =  g. 

□ 

B  Proof  of  the  Existence  Theorem 

In  this  appendix,  we  prove  Theorem  2.1(ii).  Thus, 
suppose  that  i?  is  a  set  of  queries  and  O  is  some  or¬ 
acle;  we  want  to  show  the  existence  of  an  oracle  O' 
which  is  reflective  on  R  and  satisfies  P(0'(M,p)  =  1)  = 
P(0(M,p)  =  1)  for  all  (M,p)  ^  R. 

We  will  describe  the  behavior  of  O'  by  a  pair  of 
functions,  query  :  At  x  ([0, 1]  C  Q)  — >  [0, 1]  and  eval  : 
A4  — >■  [0,1].  The  first  of  these  gives  the  distribution 
of  O' ,  i.e.,  query(M,p)  =  P(0'(M,p)  =  1).  The  second 
gives  the  distribution  of  a  machine’s  behavior  under  O'  : 
If  M  almost  surely  returns  either  0  or  1,  then  eval(Af)  = 
P(M<^'()  =  1). 

Function  pairs  (query,  eval)  can  be  seen  as  elements 
of  A  :=  [0,  l]-Mx([o.i]nQ)  ^  [q,  l]-^,  which  is  a  convex  and 
compact  subset  of  the  locally  convex  topological  vector 
space  ]^A<x([o,i]nQ)  ^  (with  the  product  topology). 
We  now  define  a  correspondence  f  :  A— >  Pow(  A) ,  such 
that  fixed  points  (query,  eval)  G  /(query,  eval)  yield  or¬ 
acles  O'  of  the  desired  form. 

We  define  /  by  giving  a  set  of  necessary  and  suffi¬ 
cient  conditions  for  (query', evaf)  G  /(query, eval).  We 
place  three  conditions  on  query' (M,p):  If  (M,p)  G  R 
and  eval(M)  >  p,  then  query'(M,p)  =  1;  if  {M,p)  G  R 
and  eval(M)  <  p,  then  query'(M,p)  =  0;  and  if 
{M,p)  ^  R,  then  query'(M,p)  =  P(0(M,p)  =  1). 

To  describe  the  conditions  on  eval/M),  we  will  con¬ 
sider  the  definition  of  “probabilistic  oracle  machine”  to 
include  the  initial  state  of  the  machine’s  working  tapes, 
so  that  we  can  view  the  state  of  a  machine  M'^Q  af¬ 
ter  one  step  of  computation  as  a  new  machine  N^Q. 
Then,  any  machine  M  can  be  classified  as  performing 
one  of  the  following  operations  as  its  first  step  of  com¬ 
putation:  (i)  a  deterministic  computation  step,  yielding 
a  new  state  N,  in  which  case  eval'(M)  =  eval(iV);  (ii)  a 
coin  flip,  yielding  a  state  N  with  a  rational  probabil¬ 
ity  p  and  another  state  N'  with  probability  1  —  p,  in 
which  case  eval'(M)  =  p  •  eval(iV)  -|-  (1  —  p)  •  eval(iV'); 


(1,0) 

(0,0) 

TMT 

TToT 

(1,0) 

(0,1) 

77^ 

7W 
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(iii)  halting,  with  the  output  tape  containing  0  (in  which 
case  eval^(M)  =  0)  or  1  (in  which  case  eval^(M)  =  1) 
or  some  other  output  (in  which  case  eval^(M)  is  ar¬ 
bitrary);  or  (iv)  an  invocation  of  the  oracle  on  a  pair 
yielding  a  new  state  N  if  the  oracle  returns  0 
and  a  different  new  state  N'  if  it  returns  1.  In  the 
last  case,  writing  q  :=  query(M',p),  the  condition  is 
eval'(M)  =  (1  —  q)  •  eval(A^)  -I-  q  ■  eval(fV'). 

Given  a  fixed  point  (query,  eval)  G  /(query,  eval), 
define  O'  by  P(0'(M,p)  =  1)  =  query(M,p).  Then,  it 
can  be  shown  by  induction  that  for  every  T  G  N  and 
every  M  G  A4,  eval(M)  is  >  the  probability  that  () 
returns  1  after  at  most  T  timesteps,  and  <  the  proba¬ 
bility  that  it  returns  something  other  than  0  within  this 
time  bound;  in  the  limit,  we  obtain 

P(M°'()  =  1)  <  eval(M)  <  P(M'^'()  7^  0) 

Together  with  the  conditions  on  query(M,p),  this  shows 
that 


=  l)>p  P(0'(M,p)  =  1)  =  1 

P(M°'()  =  0)  >  (1  -p)  P(0'(M,p)  =  0)  =  1 

which  is  a  strengthening  of  the  conditions  of  Section  2: 
it  is  equivalent  in  the  case  where  M'^  ()  halts  with  prob¬ 
ability  1,  but  provides  information  even  if  ()  may 
fail  to  halt. 

It  remains  to  be  shown  that  /(•)  has  a  fixed  point. 
To  do  so,  we  employ  the  infinite-dimensional  general¬ 
ization  of  Kakutani’s  fixed-point  theorem  [19]. 

It  is  clear  from  the  definition  that  /(query,  eval)  is 
non-empty,  closed  and  convex  for  all  (query,  eval)  G  A. 
Hence,  to  show  that  /  has  a  fixed  point,  it  is  sufficient 
to  show  that  it  has  closed  graph. 

Thus,  assume  that  we  have  sequences 
(query„,  eval„)  — ?>  (query,  eval)  and 

(query(j,  evalp  — >•  (query',  eva/),  such  that 

(query(j,  eval„)  G  /(query„,  eval„)  for  every  n;  we  need 
to  show  that  then,  (query',  eva/)  G  /(query,  eval). 

For  the  conditions  on  eval',  we  can  simply  take  the 
limit  n  — ^  00  on  both  sides  of  each  equation.  The  condi¬ 
tion  on  query' (M,p)  for  (M,p)  ^  R  is  clearly  fulfilled, 
since  query(j(M,p)  is  constant  in  this  case.  The  two 
remaining  conditions  on  query'  (M,  p)  are  entirely  sym¬ 
metrical;  without  loss  of  generality,  consider  the  case 
eval(M)  >  p,  {M,p)  G  R. 

In  this  case,  since  (query„, eval„)  — )■  (query, eval) 
and  convergence  is  pointwise,  there  must  be  an 
riQ  such  that  eval„(M)  >  p  for  all  n  >  np. 
Since  (query]^, eval(j)  G  /(query„, eval„),  it  follows 
that  query(j(M,p)  =  1  for  all  n  >  no,  whence 
query' (M,p)  =  I  as  desired.  This  completes  the  proof. 


